Principal Methods
11
scaling, showing only minibatch centering is required. Their work provides valuable infor-
mation for research on the BNN training process. The experiments of Alizadeh et al. [2]
show that most of the tricks commonly used in training binary models, such as gradient
and weight clipping, are only required during the final stages of training to achieve the best
performance.
XNOR-Net++ [26] provides a new training algorithm for 1-bit CNNs based on XNOR-
Net. Compared to XNOR-Net, this new method combines activation and weight scaling
factors into a single scalar learned discriminatively through backpropagation. They also try
different ways to construct the shape of the scale factors on the premise that the computa-
tional budget remains fixed.
Borrowing an idea from the Alternating Direction Method of Multipliers (ADMM),
Leng et al. [128] decouple the continuous parameters from the discrete constraints of the
network and divide the original hard problem into several subproblems. These subproblems
are solved by extra gradient and iterative quantization algorithms, leading to considerably
faster convergence than conventional optimization methods.
Deterministic Binary Filters (DBFs) [225] learn weighted coefficients of predefined or-
thogonal binary bases instead of the conventional approach, which directly learns the con-
volutional filters. The filters are generated as a linear combination of orthogonal binary
codes and thus can be generated very efficiently in real time.
BWNH [91] trains binary weight networks by hashing. They first reveal the strong
connection between inner-product preserving hashing and binary weight networks, showing
that training binary weight networks can be intrinsically regarded as a hashing problem.
They propose an alternating optimization method to learn the hash codes instead of directly
learning binary weights.
CI-BCNN [239] learns BNNs with channel-wise interactions for efficient inference. Un-
like existing methods that directly apply XNOR and BITCOUNT operations, this method
learns interacted bitcount according to the mined channel-wise interactions. The incon-
sistent signs in binary feature maps are corrected based on prior knowledge provided by
channel-wise interactions so that the information of the input images is preserved in the
forward propagation of BNNs. Specifically, they employ a reinforcement learning model to
learn a directed acyclic graph for each convolutional layer, representing implicit channel-wise
interactions. They obtain the interacted bitcount by adjusting the output of the original
bitcount in line with the effects exerted by the graph. They train the BCNN and the graph
structure simultaneously.
BinaryRelax [272] is a two-phase algorithm to train CNNs with quantized weights, in-
cluding binary weights. They relax the hard constraint into a continuous regularizer via
Moreau envelope [176], the squared Euclidean distance to the set of quantized weights.
They gradually increase the regularization parameter to close the gap between the weights
and the quantized state. In the second phase, they introduce the exact quantization scheme
with a small learning rate to guarantee fully quantized weights.
CBCNs [149] propose new circulant filters (CiFs) and a circulant binary convolution
(CBConv) to enhance the capacity of binarized convolutional features through circulant
backpropagation. A CiF is a 4D tensor of size K × K × H × H, generated based on a
learned filter and a circulant transfer matrix M. The matrix M here rotates the learned
filter at different angles. The original 2D H×H learned filter is modified to 3D by replicating
it three times and concatenating them to obtain 4D CiF, as shown in Fig. 1.7. The method
can improve the representation capacity of BNNs without changing the model size.
Rectified binary convolutional networks (RBCNs) [148] use a generative adversarial net-
work (GAN) to train the 1-bit binary network with the guidance of its corresponding full-
precision model, which significantly improves the performance of 1-bit CNNs. The rectified
convolutional layers are generic and flexible and can be easily incorporated into existing
DCNNs such as WideResNets and ResNets.